137 research outputs found

    Self-Learning Classifier for Internet traffic

    Get PDF
    Network visibility is a critical part of traffic engineering, network management, and security. Recently, unsupervised algorithms have been envisioned as a viable alternative to automatically identify classes of traffic. However, the accuracy achieved so far does not allow to use them for traffic classification in practical scenario. In this paper, we propose SeLeCT, a Self-Learning Classifier for Internet traffic. It uses unsupervised algorithms along with an adaptive learning approach to automatically let classes of traffic emerge, being identified and (easily) labeled. SeLeCT automatically groups flows into pure (or homogeneous) clusters using alternating simple clustering and filtering phases to remove outliers. SeLeCT uses an adaptive learning approach to boost its ability to spot new protocols and applications. Finally, SeLeCT also simplifies label assignment (which is still based on some manual intervention) so that proper class labels can be easily discovered. We evaluate the performance of SeLeCT using traffic traces collected in different years from various ISPs located in 3 different continents. Our experiments show that SeLeCT achieves overall accuracy close to 98%. Unlike state-of-art classifiers, the biggest advantage of SeLeCT is its ability to help discovering new protocols and applications in an almost automated fashio

    Highlighter: automatic highlighting of electronic learning documents

    Get PDF
    Electronic textual documents are among the most popular teaching content accessible through e-learning platforms. Teachers or learners with different levels of knowledge can access the platform and highlight portions of textual content which are deemed as particularly relevant. The highlighted documents can be shared with the learning community in support of oral lessons or individual learning. However, highlights are often incomplete or unsuitable for learners with different levels of knowledge. This paper addresses the problem of predicting new highlights of partly highlighted electronic learning documents. With the goal of enriching teaching content with additional features, text classification techniques are exploited to automatically analyze portions of documents enriched with manual highlights made by users with different levels of knowledge and to generate ad hoc prediction models. Then, the generated models are applied to the remaining content to suggest highlights. To improve the quality of the learning experience, learners may explore highlights generated by models tailored to different levels of knowledge. We tested the prediction system on real and benchmark documents highlighted by domain experts and we compared the performance of various classifiers in generating highlights. The achieved results demonstrated the high accuracy of the predictions and the applicability of the proposed approach to real teaching documents

    Enhancing Interpretability of Black Box Models by means of Local Rules

    Get PDF
    We propose a novel rule-based method that explains the prediction of any classifier on a specific instance by analyzing the joint effect of feature subsets on the classifier prediction. The relevant subsets are identified by learning a local rule-based model in the neighborhood of the prediction to explain. While local rules give a qualitative insight of the local behavior, their relevance is quantified by using the concept of prediction differenc

    NetCluster: a Clustering-Based Framework for Internet Tomography

    Get PDF
    Abstract — In this paper, Internet data collected via passive measurement are analyzed to obtain localization information on nodes by clustering (i.e., grouping together) nodes that exhibit similar network path properties. Since traditional clustering algorithms fail to correctly identify clusters of homogeneous nodes, we propose a novel framework, named “NetCluster”, suited to analyze Internet measurement datasets. We show that the proposed framework correctly analyzes synthetically generated traces. Finally, we apply it to real traces collected at the access link of our campus LAN and discuss the network characteristics as seen at the vantage point. I. INTRODUCTION AND MOTIVATIONS The Internet is a complex distributed system which continues to grow and evolve. The unregulated and heterogeneous structure of the current Internet makes it challenging to obtai

    Hierarchical Learning for Fine Grained Internet Traffic Classification

    Get PDF
    Traffic classification is still today a challenging prob- lem given the ever evolving nature of the Internet in which new protocols and applications arise at a constant pace. In the past, so called behavioral approaches have been successfully proposed as valid alternatives to traditional DPI based tools to properly classify traffic into few and coarse classes. In this paper we push forward the adoption of behavioral classifiers by engineering a Hierarchical classifier that allows proper classification of traffic into more than twenty fine grained classes. Thorough engineering has been followed which considers both proper feature selection and testing seven different classification algorithms. Results obtained over actual and large data sets show that the proposed Hierarchical classifier outperforms off-the-shelf non hierarchical classification algorithms by exhibiting average accuracy higher than 90%, with precision and recall that are higher than 95% for most popular classes of traffi

    Recommending Personalized Summaries of Teaching Materials

    Get PDF
    Teaching activities have nowadays been supported by a variety of electronic devices. Formative assessment tools allow teachers to evaluate the level of understanding of learners during frontal lessons and to tailor the next teaching activities accordingly. Despite plenty of teaching materials are available in the textual form, manually exploring these very large collections of documents can be extremely time-consuming. The analysis of learner-produced data (e.g., test outcomes) can be exploited to recommend short extracts of teaching documents based on the actual learner’s needs. This paper proposes a new methodology to recommend summaries of potentially large teaching documents. Summary recommendations are customized to student’s needs according to the results of comprehension tests performed at the end of frontal lectures. Specifically, students undergo multiple-choice tests through a mobile application. In parallel, a set of topic-specific summaries of the teaching documents is generated. They consist of the most significant sentences related to a specific topic. According to the results of the tests, summaries are personally recommended to students. We assessed the applicability of the proposed approach in real context, i.e., a B.S. university-level course. The results achieved in the experimental evaluation confirmed its usability

    Network Digest analysis by means of association rules

    Get PDF
    The continuous growth in connection speed allows huge amounts of data to be transferred through a network. An important issue in this context is network traffic analysis to profile communications and detect security threats. Association rule extraction is a widely used exploratory technique which has been exploited in different contexts (e.g., network traffic characterization). However, to discover (potentially relevant) knowledge a very low support threshold needs to be enforced hence generating a large number of unmanageable rules. To address this issue in network traffic analysis, an efficient technique to reduce traffic volume is needed. This paper presents a NEtwork Digest framework, which performs network traffic analysis by means of data mining techniques to characterize traffic data and detect anomalies. NED exploits continuous queries to efficiently perform realtime aggregation of captured network data and supports filtering operations to further reduce traffic volume focusing on relevant data. Furthermore, NED provides an efficient algorithm to perform refinement analysis by means of association rules to discover traffic features. Extracted rules allow traffic data characterization in terms of correlation and recurrence of feature patterns. Preliminary experimental results performed on different network dumps showed the efficiency and effectiveness of the NED framework to characterize traffic data

    Predicting Your Next Stop-over from Location-based Social Network Data with Recurrent Neural Networks

    Get PDF
    In the past years, Location-based Social Network (LBSN) data have strongly fostered a data-driven approach to the recommendation of Points of Interest (POIs) in the tourism domain. However, an important aspect that is often not taken into account by current approaches is the temporal correlations among POI categories in tourist paths. In this work, we collect data from Foursquare, we extract timed paths of POI categories from sequences of temporally neighboring check-ins and we use a Recurrent Neural Network (RNN) to learn to generate new paths by training it to predict observed paths. As a further step, we cluster the data considering users’ demographics and learn separate models for each category of users. The evaluation shows the eectiveness of the proposed approach in predicting paths in terms of model perplexity on the test se

    NetCluster: A clustering-based framework to analyze internet passive measurements data

    Get PDF
    Internet measured data collected via passive measurement are analyzed to obtain localization information on nodes by clustering (i.e., grouping together) nodes that exhibit similar network path properties. Since traditional clustering algorithms fail to correctly identify clusters of homogeneous nodes, we propose the NetCluster novel framework, suited to analyze Internet measurement datasets. We show that the proposed framework correctly analyzes synthetically generated traces. Finally, we apply it to real traces collected at the access link of Politecnico di Torino campus LAN and discuss the network characteristics as seen at the vantage point
    corecore